Skip to content

test(torch): skip high-memory cuda svd reference#628

Draft
zhangyue207 wants to merge 1 commit into
InfiniTensor:masterfrom
zhangyue207:test/skip-high-memory-torch-svd
Draft

test(torch): skip high-memory cuda svd reference#628
zhangyue207 wants to merge 1 commit into
InfiniTensor:masterfrom
zhangyue207:test/skip-high-memory-torch-svd

Conversation

@zhangyue207
Copy link
Copy Markdown
Collaborator

Summary

  • Skip the CUDA svd generated torch-op case for the generic (4, 4, 5632) shape before calling the PyTorch reference.
  • Keep the rest of the generated torch-op coverage unchanged.

Motivation

NVIDIA CI repeatedly failed in unrelated upstream PRs because the PyTorch svd reference path for (4, 4, 5632) consumed about 72 GiB on 80 GiB runners, leaving too little memory for the remaining pytest workers and causing CUDA OOM. The InfiniOps wrapper is not being exercised meaningfully in that case because the reference path itself is the unstable part.

Closes N/A

Type of Change

  • feat — new feature / new operator / new platform
  • fix — bug fix
  • perf — performance improvement (no behavioral change)
  • refactor — code restructuring without behavior change
  • test — adding or fixing tests only
  • docs — documentation only
  • build / ci — build system or CI configuration
  • chore — tooling, formatting, or other non-code changes
  • Breaking change (requires a ! in the Conventional Commits prefix or a BREAKING CHANGE: footer)

Platforms Affected

  • CPU (WITH_CPU)
  • NVIDIA (WITH_NVIDIA)
  • Iluvatar (WITH_ILUVATAR)
  • MetaX (WITH_METAX)
  • Cambricon (WITH_CAMBRICON)
  • Moore (WITH_MOORE)
  • Ascend (WITH_ASCEND)
  • PyTorch C++ bindings (WITH_TORCH)
  • Build system / CMake / CI
  • Python bindings / user-facing API

Test Results on Supported Platforms

Platform Built pytest Result Notes / Hardware
NVIDIA N/A Pending CI. Fix targets repeated CUDA OOM in tests/test_torch_ops.py::test_op[...,4x4x5632-svd].
CPU N/A ruff format --check, ruff check, and git diff --check passed. Python test-only change.
Full `pytest` output (optional)
python -m ruff format --check tests/test_torch_ops.py
# 1 file already formatted

python -m ruff check tests/test_torch_ops.py
# All checks passed!

git diff --check
# passed

Benchmark / Performance Impact

N/A

Notes for Reviewers

  • The skipped case is limited to cuda, svd, and shape (4, 4, 5632).
  • This avoids a repeated PyTorch-reference OOM before the generated InfiniOps wrapper comparison can complete.

Checklist

Title, Branch, and Commits

  • PR title follows Conventional Commits (e.g. feat(nvidia): …, fix(cuda/gemm): …).
  • Branch name follows <type>/xxx-yyyy-zzzz where <type> matches the PR title's Conventional Commits type and words are joined with hyphens (see CONTRIBUTING.md §Branches).
  • Each commit message follows Conventional Commits.
  • Each commit is meaningful, well-formed, and independently reviewable (see CONTRIBUTING.md §Pull Requests).
  • No stray merge commits from master — the branch is rebased cleanly on top of the current master.
  • No fixup! / squash! / wip commits remain.

Scope and Design

  • Changes are minimal — nothing unrelated to the stated motivation was added (CONTRIBUTING.md §Code/General).
  • No dead code, commented-out blocks, debug prints, printf/std::cout/print(...) left behind, or TODO without an owner and issue link.
  • No unrelated formatting churn that would obscure the diff.
  • N/A: Public API changes. This PR changes only test skip policy.

General Code Hygiene (applies to all languages)

  • The code is self-explanatory; comments were added only where the why is non-obvious (CONTRIBUTING.md §Code/General).
  • Every modified or added file ends with a single trailing newline (CONTRIBUTING.md §Code/General).
  • No trailing whitespace, tab/space mixing, or stray BOMs.
  • Identifiers in comments and error messages are wrapped in backticks where applicable.
  • All comments and error messages are in English.
  • Comments and error messages are complete sentences.

C++ Specific (if C++ files changed)

  • N/A: No C++ files changed.

Python Specific (if Python files changed)

  • Python files follow PEP 8 and project formatting.
  • ruff format --check passed for touched Python files.
  • ruff check passed for touched Python files.

Testing

  • Fresh platform test run completed. Pending CI.
  • For any platform that could not be tested, an explicit reason is given in the table.
  • New functionality has matching tests under tests/. This PR updates an existing generated test harness skip condition.
  • Pytest parameterization remains deterministic and scoped.
  • N/A: pytest.mark.auto_act_and_assert. No operator test was added.
  • N/A: Default dtype / device parameterization. No parameterization was changed.
  • Flaky test behavior is documented in the PR motivation.
  • Bug-fix regression test. The skipped case is the repeated OOM case observed in CI.

Build, CI, and Tooling

  • N/A: Fresh platform build. This PR changes only Python tests.
  • N/A: compile_commands.json. This PR does not change CMake configuration.
  • N/A: New backend or device auto-detection. No backend was added.
  • N/A: CUDA-like mutual exclusion. This PR does not change backend selection.
  • N/A: CI matrix generation. This PR does not change CI configuration.
  • N/A: Runtime dependencies. No runtime dependency was added.

Documentation

  • N/A: User-facing documentation. This PR changes only tests.
  • N/A: New operators, dispatch helpers, or public utilities. None were added.
  • N/A: Breaking change. This PR has no user-visible API impact.

Security and Safety

  • No secrets, access tokens, internal URLs, customer data, or personal hardware identifiers have been committed.
  • N/A: Third-party code. No third-party code was added.
  • N/A: Unsafe pointer arithmetic, uninitialized reads, or missing bounds checks. No source code was changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant